A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

Authors

  • Saman Khalandi Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmai, Iran.
Abstract:

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, forming feature vectors, and final classification. In the presented model, the authors formed a feature vector for each document by means of weighting features use for IWO. Then, documents are trained with NB classifier; then using the test, similar documents are classified together. FS do increase accuracy and decrease the calculation time. IWO-NB was performed on the datasets Reuters-21578, WebKb, and Cade 12. In order to demonstrate the superiority of the proposed model in the FS, Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) have been used as comparison models. Results show that in FS the proposed model has a higher accuracy than NB and other models. In addition, comparing the proposed model with and without FS suggests that error rate has decreased.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

Transferring Naive Bayes Classifiers for Text Classification

A basic assumption in traditional machine learning is that the training and test data distributions should be identical. This assumption may not hold in many situations in practice, but we may be forced to rely on a different-distribution data to learn a prediction model. For example, this may be the case when it is expensive to label the data in a domain of interest, although in a related but ...

full text

Naïve Bayes Classifier with Various Smoothing Techniques for Text Documents

Due to huge amount of increase in text data, its classification has become an important issue, now days. There are many good classification techniques discussed in this paper. Each classification method has its own assumptions, advantages and limitations. One of the most widely used classifier is Naïve Bayes which performs well with different data sets. Various Smoothing techniques are applied ...

full text

Text Classification using Association Rule with a Hybrid Concept of Naive Bayes Classifier and Genetic Algorithm

Text classification is the automated assignment of natural language texts to predefined categories based on their content. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Now a day the de...

full text

Bayes and Naive Bayes Classifier

Bayes and Naive-Bayes Classifier

full text

Poisson naive Bayes for text classification with feature weighting

In this paper, we investigate the use of multivariate Poisson model and feature weighting to learn naive Bayes text classifier. Our new naive Bayes text classification model assumes that a document is generated by a multivariate Poisson model while the previous works consider a document as a vector of binary term features based on the presence or absence of each term. We also explore the use of...

full text

Naive Bayes for Text Classification with Unbalanced Classes

Abstract. Multinomial naive Bayes (MNB) is a popular method for document classification due to its computational efficiency and relatively good predictive performance. It has recently been established that predictive performance can be improved further by appropriate data transformations [1, 2]. In this paper we present another transformation that is designed to combat a potential problem with ...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 4  issue 3

pages  167- 184

publication date 2018-08-01

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023